New language models using phrase structures extracted from parse trees
نویسندگان
چکیده
This paper proposes a new speech recognition scheme using three linguistic constraints. Multi-class composite bigram models [1] are used in the first and second passes to reflect word-neighboring characteristics as an extension of conventional word n-gram models. Trigram models with constituent boundary markers and word pattern models are both used in the third pass to utilize phrasal constraints and headword cooccurrences, respectively. These two models are made using a training text corpus with phrase structures given by an examplebased Transfer-Driven Machine Translation (TDMT) parser [2]. Speech recognition experiments show that the new recognition scheme reduces word errors 9.50% from the conventional scheme by using word-neighboring characteristics, that is only the multi-class composite bigram models.
منابع مشابه
Squibs: Prepositional Phrase Attachment without Oracles
Work on prepositional phrase (PP) attachment resolution generally assumes that there is an oracle that provides the two hypothesized structures that we want to choose between. The information that there are two possible attachment sites and the information about the lexical heads of those phrases is usually extracted from gold-standard parse trees. We show that the performance of reattachment m...
متن کاملStructure Alignment Using Bilingual Chunking
A new statistical method called “bilingual chunking” for structure alignment is proposed. Different with the existing approaches which align hierarchical structures like sub-trees, our method conducts alignment on chunks. The alignment is finished through a simultaneous bilingual chunking algorithm. Using the constrains of chunk correspondence between source language (SL)1 and target language (...
متن کاملPrepositional Phrase Attachment without Oracles
Work on PP attachment resolution generally assumes that there is an oracle that provides the two hypothesized structures that we want to choose between. The information that there are two possible attachment sites and the information about the lexical heads of those phrases is usually extracted from gold standard parse trees. We show that the performance of reattachment methods is higher with s...
متن کاملLexicalized and Statistical Parsing of Natural Language Text in Tamil using Hybrid Language Models
Parsing is an important process of Natural Language Processing (NLP) and Computational Linguistics which is used to understand the syntax and semantics of a natural language (NL) sentences confined to the grammar. Parser is a computational system which processes input sentence according to the productions of the grammar, and builds one or more constituent structures which conform to the grammar...
متن کاملSyntax Augmented Machine Translation via Chart Parsing with Integrated Language Modeling
We present a hierarchical phrase-based translation model which annotates and generalizes existing phrase translations with syntactic categories derived from parsing the target side of a parallel corpus. We associate target parse trees for each training sentence pair with a search lattice constructed from the existing phrase translations on the corresponding source sentence, and consider techniq...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001